Skip to content

phar: cap decompression output against declared uncompressed_filesize#21806

Open
iliaal wants to merge 1 commit intophp:PHP-8.4from
iliaal:fix/phar-decompression-bounds
Open

phar: cap decompression output against declared uncompressed_filesize#21806
iliaal wants to merge 1 commit intophp:PHP-8.4from
iliaal:fix/phar-decompression-bounds

Conversation

@iliaal
Copy link
Copy Markdown
Contributor

@iliaal iliaal commented Apr 18, 2026

Prevents decompression amplification in phar_open_entry_fp().

The function decompressed entries by streaming compressed_filesize bytes through a filter into a tmpfile, then checked the output size after the full copy. A crafted phar whose compressed data expanded beyond uncompressed_filesize could write large amounts to disk before the check ran.

Replaces the single php_stream_copy_to_stream_ex() call with an 8 KiB chunked loop that flushes the decompression filter and checks the running output size after each chunk. Aborts when output exceeds uncompressed_filesize.

Comment thread ext/phar/util.c
Comment on lines -929 to 944
if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, entry->compressed_filesize, NULL)) {
spprintf(error, 4096, "phar error: internal corruption of phar \"%s\" (actual filesize mismatch on file \"%s\")", phar->fname, entry->filename);
php_stream_filter_remove(filter, 1);
return FAILURE;
size_t remaining = entry->compressed_filesize;
while (remaining > 0) {
size_t chunk = remaining < 8192 ? remaining : 8192;
size_t n = 0;
if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, chunk, &n)) {
goto decompression_error;
}
if (n == 0) {
break;
}
remaining -= n;
php_stream_filter_flush(filter, 0);
if (php_stream_tell(ufp) - loc > (zend_off_t) entry->uncompressed_filesize) {
goto decompression_error;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this whole logic not the point of it. Surely it suffices to pass the size_t n as a pointer and check if the copy succeeds and the size is the same as the compressed_filesize. No need for this looping nonsense?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, "looping nonsense" 😄

The n returned by php_stream_copy_to_stream_ex is bytes read from the source (see *len = haveread at main/streams/streams.c:1831), not bytes written to the destination after the decompression filter runs. Checking n == compressed_filesize just confirms we consumed the declared compressed length; it tells us nothing about how much decompressed data landed on ufp.

The underlying issue is amplification: a phar entry declares e.g. compressed_filesize=200, uncompressed_filesize=1024, but those 200 bytes decompress to gigabytes. The pre-patch code already had a post-copy size check (php_stream_tell(ufp) - loc != uncompressed_filesize), but it only fires after the single bulk copy has already flushed the full decompressed payload to tmpfile. The goal of this patch is to bound what hits disk, not to detect the mismatch.

The only signal that bounds filter output is the write position on ufp. The chunked loop checks php_stream_tell(ufp) - loc > uncompressed_filesize after each 8 KiB chunk, with php_stream_filter_flush(filter, 0) per iteration so tell() reflects the filter output rather than still-buffered bytes. This looked like the simplest solution, but maybe there are others; a single copy_to_stream_ex with an n check won't catch this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of _php_stream_copy_to_stream_ex says that it sets *len to the number of bytes moved. So are you telling me this is a lie? Or is the problem is that we use a stream filter to do the decompression (which is maybe the issue...).

I'm not sure if I see the point in preventing to hit the disk if we are only writing to a temporary file. Unless this is somehow a security issue this seems to do extra work, complicate the code, for little benefits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring isn't lying, but with a write filter on the destination, "bytes moved" means bytes fed into the filter (== compressed_filesize on success), not bytes the filter emits to the underlying file. Your 2nd point is the right one: because decompression runs as a filter, *len can't report post-filter output, so the caller has no way to learn the amplified size from a single copy_to_stream_ex call.

On the security side: technically a DoS from a malicious phar is possible here, though a low-quality one given the lack of persistence. More of a bug with security implications than an advisory imho, hence filed as a bug. A phar entry with compressed_filesize=200, uncompressed_filesize=1024 whose 200 bytes decompress to several GiB will write several GiB to the tmpfile before the existing post-copy check fires. An attacker can exhaust /tmp or swap with a handful of KiB of input against any service that opens entries from an untrusted phar (composer-style tooling, CI pipelines scanning phars, upload processors). The "filesize mismatch" path was designed to flag corruption, not to bound amplification, so it fires after the damage.

A check that runs during the copy is still required; there's no way to bound amplification after the fact.

@iliaal iliaal requested a review from Girgias April 20, 2026 15:27
@ndossche
Copy link
Copy Markdown
Member

So what does this actually fix?
What's the attack model? If the attacker can control the phar file, which you assume here, it doesn't matter if they lie about the file size or not, they can make a legitimate phar file that will decompress to gigabytes of data, without the check firing.
NAK from me.

@iliaal
Copy link
Copy Markdown
Contributor Author

iliaal commented Apr 20, 2026

Yes, would need have control of phar for this to matter, which as you said would be mighty risky already, hence this is not an advisory just a potential issue.

Given the proposed solution with potential negative performance implication of reading things in small (typical php sized buffers). Perhaps deeper fix inside php_stream_copy_to_stream_ex could fix by checking oddities like 1x100 decompression ratio which would be very odd, etc... but probably higher complexity fix.

That being said, still a concern given the amplification aspect and could in theory be chained with something else and essentially few bytes could cause some trouble. I do think this should be addressed, but I can see why a NAK (new for me)

@ndossche
Copy link
Copy Markdown
Member

Perhaps deeper fix inside php_stream_copy_to_stream_ex could fix by checking oddities like 1x100 decompression ratio which would be very odd, etc... but probably higher complexity fix.

I agree that if we want to stop amplification attacks (which is difficult to do generally), a more general solution is desirable rather than doing this in many different places at once.

Adds an optional max_output parameter to zlib.inflate and
bzip2.decompress filters. When set, the filter tracks bytes emitted
and aborts with PSFS_ERR_FATAL once the cap is exceeded, stopping
decompression amplification mid-stream instead of after the full
payload has landed on disk.

phar_open_entry_fp() passes entry->uncompressed_filesize as the cap
so a lying phar (small declared size, payload that decompresses
beyond it) fails during streaming. The previous post-copy filesize
mismatch check is retained to catch under-decompression.

The parameter is opt-in: omitting the key keeps existing behavior for
all current callers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants